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ABSTRACT 

The true state of the syst^ea described here is 
characterized by a probability vector. At each stage of the system an 
action lust be chosen froto a finite set of actions. Ba^h possible 
action yields aii etpected reward, transforms the system to a new 
state in accordance with a HatkoT transition matrix, and yields an - 
observable outcome. The problem of finding the total maximum' 
discounted reward as a function of the probabil4.ty state vector may 
be formulated as a linear program with an infinite number of 
constraints. The reward function may be expressed as a partial 
N-ditoensional Haclaurin series. The cqefficients inithis series are 
also determined as an optimal solution to a linear program with an 
infinite number of constraints. A sequence of related finitely 
constrained linear programs is solved; Which then generates a sequene:^^ 
of soljjtions that converge to a local' minimum for the infinitely 
constraiaed program. This model is applicable to computer assisted ^ 
instruction systems as well as to other situations. (Author/CH) 
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SUMMARY 



This Is the last In a series of technical reports concerned with 
mathematical approaches to Instructional sequence optimization In 
Instructional systems. The problem treated here Is very closely re- 

^ lated to that treated by Smallwood^ and Sondlk (4) . Both papers deal 
with Markov decision processes where the true state of the system Is 

^not kpown with certainty. Hence the state of the system Is characterized 
^y a probability vector. Each actlpn yields an expected reward, trans- 

* forms the system to a new sta^te and yields an observable outcome. One 
wishes to determine att action for each probability state vector so as 
to maximize the total expected reward. Smallwood and Sondlk (4) solve 
this problem exactly for a finite timet horizon. This report treats 
the Infinite time horizon with a discount factor, using a partial N 
'dimensional Maclau'i'ln series to approximate the total optimal reward 
as a fun^tlp^i of the probability state Vector. Whi^lfe this model was 
developed for computed aided instruction, it is appll<:able to othelr . 
situations well. This model alsQ^ Is of considerable theoret;lcal ^ 
valuie. * ' >. 
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ABSTRACT 



This p^per describes a system that may b^ In any one of states 
l,2,.«.,Nii The true state of the system Is not known with certainty, 
and consequently Is described by a probability vector. . At eadh stage 
an action taust be chosen from a finite set. E^ch possible action 
returns an expected reward, transforms the systep to a^n^w 3tate in 
accordance with a Markov transition matrix, and yields an* observable 
outcome.* It is required to determine an actloa, fi^x each poasll^le 
state vector in order to maximize the total e^dti^d reward over an 
infinite time horizon under a discount factor, 3, Wh4^ 0<3<1. 

The problem -of finding the total m^imum dlscc^ted reward as 
a function of tjie probability state vector may be^f o^raulated as a 
linear program with an infinite number of constraints. The reward 
function may be expressed as an N dimension^]^ Maclaurin series and 
in this pap^l^'I'lt is approximated by a partlali^^erles pon§lstlng of 
terms ,\ii> to dfegree n. The coefficients ^n thisf Merles fitJce also 
detei^mltied as an optimal solution to a linear program wlth.^n Inflnit 
number of constraints. A sequence of related finitely doiis1:ralned 
linear programs are solved which, generate a sequence of solujtions 
rthat cojivferge to a. local minimum for the infinitely constrained prp^ 
Igram. It is an open question as to whetjher this local minimum is 
actually a global minimum. However it should be noted that, the 
function being approximated' is convex arid consequently has the pro 
perty that any local minimum is a global, one as well. 
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PARTIALLY OBSERVABLE MARKOV DECISION 
PROCESSES OVER AN INFINITE PLANNING 
HORIZON WITH DISCOUNTING 



1. Introduction ^ 

This paper describes a system that may be In anyone of states 

1, 2 N. The true state of tHe system ±b not ktlowi^'wlth certainty 

' - ■ ■ ' / ■ / . ■ • 

and consequently Is described by^a probability vectof. At each stage 

•* -< - , • ' ' ' 

an action must be chosen from a finite set^ This action returns an 
expected reward, transforms, the system to a new (but not necessarily 
different) state according to a Markov process, and yields^ an observ- 
able outcome. The problem addressed here, Is that of determining an 
ac^on for each possible state vector In order to maximize the total 
expected reward over an Infinite horizon und^r a' discount factor, 3, 
w^ere 0<3<1. ' \ ^ ^ \ 

Smallwood and Sondlk (4> have treated this problem for the 

finite horizon case without a discount factor and have determined that ^ 

' 1 * 

the total maximum expected reward is a plecewlse linear function of 

CM 

the probability state vector. Their results can be trivially extended 
lo include the discount case. , * 

The observable state c§se, that id the case where the true 
state of the system is known with certainty has been treated extensj^vely. 
For both the finite and""inflnlte horizon under a discount factor, Howard (1) 
developed! a policy Improvement routlAe for determining an optimal action 
and » the optiioal cost for each state. ^ . 
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11^ Formulation , . • 

Irv this formulatloa, the notation of Smallwood and Sondlk wd,ll 
/be used.' It* Is assumed that this system^c^n be modeled by an N-state * 

discrete t.tme iMarkov decision process. - . « / 

The observed' state of the system Is chaii;actetlzed by a proba- 

blllty vector ir^ where tt^ Is the probability the true state of the 

system Is 1, * . 

At e^ch point In time ah action must pe selected fr'om a flnlte*^- 

set- Associated with an action, a, 1^ a probability transition matrix 
where is ^he conditional probabilll!y the system will make Its 

next transition to state-J given the'current state la 1 and action a^ 

Is taken. An observed outliome follows each action with rf^ denoting 

the probability of observing output 9 given the new state of the system 

is j and action a was taken. In addition an immecliate reward wf is 

. 1 J 0 ' , 

vlnc'urreji if action a is tliken, output 6 is observed, and- the system makes- 
fhe transition from state 1 to state j. Thus if action a is- taken and 
output .9 is, observed, the new state is it* Vherje , , 



(ft 



/ 



The above transformation is sumigarlzed by- 



> 



7t' * T(7T/a,8) J. (2) 

A policy is a rule that assigns an action to each possible state 
vector. It is required to find a policy that maximizes the expected dls- 
counted rewards over all periods for each possible state vector. Let 
V(it) be the total discounted reward associated with. such a policy. 



Then V(7t) must satisfy the following recursive equation. 



> max 
y(TT)*= a 



(3) 



Letting = Plj^je^dje 



(4) 



equation (3) Is simplified somewhat %o equation (5) 



i , max 

v(tt) =' a 



0) 



Can 



Once the function for VCtt) Is known , an optimal' action for ir can 

■ i ■ ^ . ■. ' ' ' ■ . ■ ' ' 

be determined as one which maxlmlzBS the right hand side of (5). / 



/ 
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Hit A Learning Example ' * • ^ 

As an cdilustrat'ioni it will ' be , shown how the system described 

■ , ' ■ • ■ ■ Jk, 

' . • * ^ A • ' . ^ 

• in tK§ previous" section m^y b^ applied fcb the human learning process* \ 
Consider a ciJurse^which ig* given in several levels of ins<ruc- 
tion. ,The levelsvare den/ted 2, . . . ,1? with N being the easiest and 
1 the h^desi^. The structure of the levels is a definite hierarchy in 

the sense that^if a^student knows the material at lievel i he must also 

■ •* 

■ ^ ' 
know the material at. any level* j>i. Several examples where this situ- 

' \ \ V ' . * ' • 

^ ation'^may apply follow: ^ r , / 

; '/ - . ...t • ' ' : . . . ^ . ' ^ ■ 

. ^' The first situationjis one where the material covered at one 

lievel includes all that covered at preceding levels^ plus so&e additional 
material. An example bf this is a program developed^'^at Behayioral Tech- 
no logy Laboratories (BTL) to teach students Kirchoff *s l4ws^. This 
course is .compViised of eleven levels with the I lowest level^def ining the 
units for voltage, current an^J resistance up to the highest level whi^ 
deals with the application of Ohm's Lay and Kirchoff -s voltage and/current 
laws in complex networks. . Another program developed at BTL is af short 
course in trigonometry consisting of five levels. At the lowest level 
3tudents are given the definitions of the six basic trigoiyimetric ratios. 
Thqn the student is given a right triangle in which the /lengths of the^ 
sid^fi are determined by a randam number generator an'&Zthe student is 
asked to determine these ratios for 'one of the adutie ^^gles , Succeeding 
levels deal with material 6n relationships between these ratios and pro- 
blems .testing the student's knowledge of these relationships. 

A second situation is one where the material and problems covered 
at a particular level are virtually the same as the Immediately preceding . 
Level except more clues and hints are given at/ th^^^jpt^^fMit^ 1^ A 
good example of ^this is a^ version of the Kirdnoff's laws program considered 

11 
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^ I ^ - ' - 

earllet at BTL In which problems would be^ given In level as follows: 

1. Problems are given in st^ps with cues and knowledge of^ 
^ . results at each step. \^ ' ^ 

N ' \ ' 

2. Problems are given in steps with no cues or knowledge of 
results at each step. 



3. The student solves problems in* steps but he chooses the 
steps. 

N 4. The studentJbl:s^ simply given problems and asked to solye 
them. i / 

A third situation l^?one in which a student is to be dr;l!^lled 
in a skill in order that he be able to perform it rapidly. Thus the ' 
exercises are virtually the same at all levels but the time constraints 
ate tighter at the higher levels. Iri^ the BTL intercept t\;ainer for 
the radar intercept observer function » the student is trying to fire 
a missile at the nose of a target and then turn around and fire another 
missil^e at th^ tail of that aircraft. The first missile is a radar 
guided missile fired when in the forward quarter and^, the second a h^t 
seeker fired wh^n in the rear quarter of the enemy aircraft. He is 
given a radar reading and must correct his angl^ 'of ^proaah so as to 
be on a lead collision course that will insure a hi'gh hit probability 
wheri he fires the n^issile. At higher levels the student is given such 
problems at faster aircraft speeds. 

Note, however, the assumption given for this model v^ould not 
be ^ppl/lcable for ^e situation where a given level did not use certain 
material introduced at preceding levels^ > . 





A student is in state i if, he knows the material off level i 
but not at ^ny level more difficult than i and in state N+1 if he does 
not know the material at any level. ' , ' 

There are N actions ^nd' action i cons instructing the 
student^ ifi the material of level i and then giving the studefit a t^t 

' . ' •/ " -5- ■ ' . 



on that material. For each action there are two possible outcomes — 
eitl^er the student passes the test or he falls It. The objective* Is 
to develop an adaptive Instructional sequence so that the student demon- 
strates knowledge of the material at level 1 as quickly as possible.- 

Knowledge at level 1 Is demonstratecl by passing a test on the material 

a 

at level 1. The reward, w. would be the negative of the expected 
time It would take to obtain Instruction at level a and the system goes 
from state 1 to state j and 6 (success* or failure at a) Is observed* 
For completeness a trap state <^ would be needed. The student goes to 
state (f) with probability one once he successfully completes the material 
at level 1. The only action In state <^ Is to^do nothing which yields 
zero reward and k^epa 'the student, ih state (() with probability one. 

Wollmer (6) greats the mibre restricted problem where p . = o 
unles-s l=j or If l=^/and j=l+l. Thus If a stude^ Is In state 1, he 
remains In stat# 1 unless be receives ihstructlon at level l*fl. In 
which case he either remains In state 1 or advances to state l*fl. This 
would not allow' the possibility qf forgetting. ^ 

; Other situations where partially observable/Markov Decision 
processes occur are In machine replacement, decoding from solirces trans- 
mitting over a^^oisy channel, medical diagnosis, and searching^ for a 
moving object. : 

f Note, that If the ^assumption of a strict hierarchy In levels 

N ^ 

were dropped, the set of states would expand from N*f 2* to ^ +1 Including 

' / 

the trap sta^e. ^ • ^ ^ / 



In this section it will be sKo^ ,that a nl^ximum reward function 
'exists and that it is a convex function of the reward tt. 

Le^ V (tt) be the maximum reward function for the n period 

^ n " 

horizon. Then * . N 



max 

V (tt) - a 
n 



(6) 



Smallwood and Sondik (4) have shown that V (tt) is * 
1, Convex % 
\ 2. Plecewise Linear 

lim 

It will b^ shown that n <»V (tt) exists and is convex in tt^ 

n 

^Define f so -that 1v (tt) - V - (tt) I < f all n and f is the 
n , • n n-1 ' - n n 

smallest real number with this propert^T^nd V (tt) = 0. The fv's are 

' o n 

well defined since all V (tt) are bounded above and below* 

" . . ■ \ 

»Lemma.l: f . . ^ 6f - v 

, n+1 n . N 

Proof : Choose a(TT) as the action that maximizes the right 

hand side of (6) for V ^-(tt) if V (tt) > V (tt) or for V (tt) 

n+1 ^n+1 n n • 



otherwise. 



Then IV^^^(TT) - V^(TT)1 < |b I PjjrJe(V^{T(TT/a,0)] 

-V^.l[T(^/a.O)]|' < 81. 

Corollary 1: For n* > n, V *(tt) V. (tt) | . < €(n) 

' n n . 



where G(n) 0. 



* 1 

.__Nhile Smallwood and Sondik ^assume 3*1 • their results hold for 
0<6<i. . . " ^ 



Proof: From lemma 1, < B and consequently 

Iv (n) - V (Tr)| < I f. < 6"f, I & " £,ti"/ (1-8) 



l=n+l 



1-0 



Theorm 1: The function V^C"^) is ab^lutely convergent. ' 
Proof: Choose J^any particular tt^tt. By Corollary 1, the 

V*(it) Is bounded above and below' and hence has an Infinite covergent 

n 4 > 

subsubsequence with lln^it V^Ctt). Choose e > 0 and n such that e(N) < e 

^for N >L n and c(n) Is as defined la corollary 1. For any N >. n and 

n >. n In the convergeilt subsequence |v (it) - V-Ctt) | < e and consequently 

N n » 



|v„(tt) - V*(t4J < e. Since n^ls Independ 



ent of 7T, the theorem Is proven. 



Thus*V*(^) = V (tt) is well defined. . 

Theorem 2: V(tt) is convex In tt. 
, Proof: Define f (7,11^,7^2) = V(isTT^ + h^^^'^i-^^) - ^VCtt^)^. 



Assume V(tt) Is not cpnvex and choose ir^^and such that f (V.tTj^^^) ^ 
k > 0. Choose n such that N > n ^|Vj^(tt) - V(tt) | < K/2. | f (V,*j^ .tt^) - 
f (V^^,tt\, ,TT^) I < K. Thus f (V„,TT, ,TT-) > 0 which Is Impossible since V_^(tt) 
is convex. . ' ^ 

Notd, that the piecewlse linear pijoperty of V^('f) does not Imply 
plecewlse linearity of V(ir) as any continuous function may be expressed 
;as the limit of a sequence of plecewlse ifhear functions. / . 
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V. Linear Program Formulation , 

> 

In the cas^ of the observable finite state Markov decision 



processes with a discount factor/ the problem of finding a 



maximum 



return for .each state may be formulated as a llneaiT program. The . 

development of this may be found In Ross (6). In this sQCtloti IC is 

1 

shown that a modification of this .formulation extends to th6 problem 

formulated In Section II. Portlons^of the development which are Similar , 

to the finite state case will be outlln^ but without rigorous proofs. 

' ^Con3lder the* set B of all continuous bounded functions defined 

• r I'^/'^j > 0 all 1, y TT . = 1 I . Let^the operator A be (ief^i^ed on 
I 1 ~ 1 J ^ 



on S, 

this set as follows, 

{ max 
Au(tt) = a 



(7) 



Note that ' , . 

1. u <. V Au <. A , 

V ■ 9> 

• 2. AueB all ueB . ' 

. 3. A:&^B Is a contraction mapping oh B. . 

- The Operator A Is the optimal return function for the on^ period 
problem In wklch a terminal Veward u(7r) Is given for the terminal state. 
Since A:B B Is a contractlbn mapping,. It has* a unique fixed point, 

V » Av, 



1 n 

,ooA u for iny ueB. By Equation (3), this unique fixed pfplnt 



i 



muat be the optimal reward function. Let us consider any u such that / 
Au < u. Then u >. Au > A^u >. n"''i"*«A'^u « v. ^hus the optimal return func- 
tion V minimizes uCtt) for each ireS among all functions u satisfying Au £ u, 

/ 

In the finite, state case where the above conditions also hold, ^ 
It Is noted that minimizing u^ for each st^te 1 may be accomplished by 

. , 16 : 
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minimizing the sum of the u^*8. For this problem where such' ^ sum 
h Would be Infinite^ the average value of u(tt) may ^>e minimized. Thus, 



. finding the function u(it) is equivalent to solving the following 
infinite constrained program. 

Find min Z, u such that ^ 7^^ "' 



Z - / . . . / u(TT)'dTT diT -diT ^. . .(Itt, 
J J n n-1 n-2 1 

Itt^=1, tt^>:0 



(8) 



subject to 



^l^'l'^l "^^^ ^ iT^P^.r u[T(ii/a,e)] < .u(ii) for 



(9) 



Since the function u(it) is continuous j|pd defined ^n a.closed 
bounde4 set. It may be^expressed In an N-dl'menslonal Maclaurln series: 



\ 



V(ti) = C + 



^ , 1^ , . . . ,1 1» 2. n 1 . 2 N 

11 n , . 



(10) 



If V(it) Is expressed as such a series or approximated by a 



ting of terms 



partial series consisting 



C ■ . in (8> is simply 



f 

up to degree n, the coefficient of 



■'■'""l~'"2~'"'""N-i 



(11) 



In evaluating the Integral the following lemma Is needed. 



T 1 . r ^ / n' m! n! 

Lemma 2: y (a-x) x dx = -^^^^ 



m+n+1 
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'Proof :^ Integiratlng by parts one obtains for the above 



Integral 



J. ^y^n-1,* <nl 



a - n f a __xnH-l-n-lj 



p ^ iiri-l 



rfl J 



(a-x) X dx 

.0 



(a-x) ^x dx. 'Applying this relationship 



recursively, one obtains -(^^ J (a-x) dx « -J^^;^ a 

\ o - ^ ^ t 

r 

?rom this lemma^ expression (ll) can J>e evaluated.^ 



Theorem TheWalue of expression {lyy Is n ' 1 1/ 



j-1 ^ . 



Prp,of : 

.1 'x-^ 



Integrating (11) with respett to gives 



n=:2 

1 ! r /. y. 1- X Tt n-1 1+11, , . 

(1 +1) ! y tt/ / J ' \ 1 J ' "-^ 1 



n-1 1+1 C 



n-2. 

Applying lemma 1 with' a=l- i it ana Integrating with respect to tt 
yields 



, n n-1 n r\ rt , 1 ' 



n 



n 



Continual application of lemma 2 yields II j!// J] (1+1)1! 



j=l /\j=:l 
th 



j 



Thus If V(ir) Is to be approximated by an n=» degress polynomial 



function In ir, then substituting the expression of theorem 3 and (1) In 
(8) and (fi) and rearranging terms yields: ' . , 



Find C , C. . . mln Z auCh that 
o 1, 1„ . . ..1 , ' 
1 z n 



z = c + y 

, o ^ 



n 1, !] 

/ 



^1/1,. ..1 
12 n 



(12) 
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n ' 
V2--- n 1-X ^ ^ 

n ' 

- where, k. . " (^^^P^'^^fl)^J ' d^) 

^r2---^n j=l ^ ^ ^^^^ 

. 'a a 

for all e, all tt>0 such that Ttt^-I ^ 

Thus the problem of solA^lng the program (8-9) with a multi- 
noml£|l approximation of u(tt) becomes a llilear program (12-1J5) with 
an infinite ntunber of constraints and unrestricted variables. Note 
that the minimum value of 2 obtained in the linear program (12-15) 
would actually be larger than that obtained in the f/rogram (8-9). 



VI. Computational Procedure ' 

* ^ ■ / ' . * \ . 

Given an optimal solution to the linear program (lV-15), con- 

. ■ . ■ ' ' ' \ 

slder the 8et>*of constraints fpr which the G ^ . are basic. If 

the, program was solved with th^se constraints only, the same solution 
would be obtained and all otherj constraints would be* satisfied. Thus, 
while the program consists of an Infinite number of constraints, only 
a f:|,nlte number ne^ to be Included provided the correct ones are chosen. 
This will be taken li^dyan^ of by solving the program with a-'flnlte 
subdet of the constraints. Introducing an unsatisfied constraint, then 
dropping a|:iy that are not binding, and continuing until an optimal ;^ 
solution iL obtained. - \ 
Let the quantity f(ir\C) be defined as follows; . 



n 



i'2""N i»i ' ^ , ^^^^ 

The constraints (13) ar^ equivalent to FCtt.C) >^ 0 all tt. Thus If at 
least one constraint Is not satisfied for a given C vector the value 
of 1T that minimizes FCiTjC) Is the most unsatisfied, one. 

The procedure for saving the linear program (li-15) Is glven^ 

a ■ ■ I 

In algorithm 1.. ' 

Algorithm 1 ♦ , ^ 

1. Formulate the linear program with any finite subset of the 

constraints In (13) . * . 

, ' ■ *■ 

2. ' Solves the linear program^ for G. . ' 

• ' 3. Delete any constraints for which a slack variable Is basic. 
4. Solve the following non-linear program. 



-13- 

20 




' Find ''^ ^ mln ^ sue h that 

Z' = f(ir,C) tl7) 
N 

I =-1 • " (18.)- 

1-1 

If a* >^ 0, terminate as C Is optimal. Otherwise Introduce the 
constraint corresponding to the value of ir that optltalzes (17-18) and 
go back to Step 2. / 

A ioeul optimum to (17-1*) may be fou^id by algorithm 2. 

Algorithm. 2 . ^ , ^ 

1. Choose an arbitrary probability vector and evalute f(7r,C)» 

2. Find 'an order pair (l,j) such thajt Increasing tt^ by e and 
decreasing tt by e dec'reases f(7r,C) without violating 0<7r <1 and 



0<Tr.<l. /If uo such pair can be foUhd» terminate as tt is a local-, 
optimum. ■ . . ' ^ ' X 

3. Incx^ease' tt to tt and decrease ir to tt such that neither 

xthe pair or ("j,l) satisfied the conditions of Step 2. Then go 

■ ^ ' ' \ " • ■ ■ 

back to Step 2. , • 

For flnlteness, the* e of Step. 2 would be chosen ahead of time. * 

■ ■ «t , ^ 

There are several ways of performing Step 3 to find the new 

value of TT. and tt . .\ One efficient way Is to first bracket tt. and tt, 

1 . j V . . 1 j{ 

between Tr|, irj and tt'^ and tt^' and continually reduce the difference between 
these by a factor bf one half, thus converging on a single ,polnt. 



Initially tt^ and tt' would be the current values of tt an^ tt 

and tt'^ = tt^ -h 6 , tt^* = 71^-6 where 6 = mln [l-Tr^.Tr^ ) . Then consider 

the pair ?^ « ^^(tt^ + tt^ and Tr^ « Ji(7rj + tt j) . If f(7r,C) Is a local 



minimum under the restriction t^at all components of tt other than tt^ 
IT are held constant, then tt Is the desired point* Otherwise, 

- -r f . f / 

let IT. and it^ replace tt^ and tt. If the direction of decrease Is towards 

M M 1 II II . 



TT. and TT, but let tt. and tt, replace tt. SHd tt. If the direction of 
i . j 1 j 1 4 

decrease Is towards tt^ ahd tt^. ^ If neither direction yields a de<;rease, 

_ ' ^ . » * f » " II II 

let TT^ and tt^ r.eplace tt^ and tt^ If f(TT )>f(TT ) but replace tt^ and tt^ 

' II I 

otherwise. Step 3 would terminate when tt^ - ^l^^l ^^^'^^ Gj^<e. 

Note that If the C vector approximation of U(,tt) were exact, 

any local minimum of f (tt,C) would be a gloj^ajr minimum due to. the con- 

vexlty of V(tt). While this is not guaranteed in the approximation, one 



could t^ke random samples of , tt* in an attempTnto f ind a vector ^yi^dlng 
a^ lower value, of jJ^chan the local minimum or evaluate 8' fj>t all tt 
vectors whose components are multiples of 1/n where n 1^ large if the 
result mln 2**0 Is obtained. - ' ^ 

When Introducing an unsatisfied con^tfraint, ItNis recommended 
that the dual simplex piethod be used ta solve- the resulting program 



which is already dual, feasible. 



The sequence of min ^'values generated by algorithm 1 is'non- 
decreasing, bounded above, and hence must have a limit. It is an open 

question as to whether this limit is the true mln 2 or In particular 

-'■ * . 

if the sequence of 2' values in 'algorithm 2 tend to z^ro. Consider the 

sequence of. linear programs solved by algorithm 1 and assume the number 

' ^ * *" ■ 

of equations in each equals the number of components in the C vector 

^ ■ * ,. . ■' ' • 

plus one. It has already been shown that» it will not exceed 'this num- 
ber anj^ if it is less,, additional constraints with all coi^Ef Iclents 
being zero may be added. Consider also th'6 sequence of matrices formed 
by the probability vectors that generate these constraints. -Since these 

22 . 
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are bounded above, these majtrlces, and consequently the set of linear 
programs for algorithm 1 must have a convergent subsequence. Consider 
noy^the sequence of constraints generated by this sequence In algorithm 2 
By the sa|he argument this sequence ^ must haifQ a' convergent subsequence. 
In this latter sequence, either f(Tf,C)->-p or else the cost coefficient 
In the pivot column tends to zero for If not the Increase In mln 2 
would not 1. tend to zero which Is Impossible since mln 2 Is bounded above. 

If\ the sequence of f(VyC) values generated by problem 2 dlxl 
not appear \.o tend to zero after many Iterations while the change 
In ^In 2 dj.d\ appear to tend to zero, some possible ways out are as 
follows. First one may sample a large number of probability vectors 
and find eme which would give the latge^st Increase In 2 on a single 
pivot. Second, one may search all probability vectors that are multiples 

of 1/n wherein Is ^ large number and find the o^e which gives the largest 

■ 

Increase In 2 for one pivot. 

It should be noted that If^ the sequence of 2 values obtained 
In algorithm, 2 do not tend to zero, then one has a situation somewhat ^ 
analogous to cycling In the dual simplex methdd. 31uce cycling almost 

^ . ■ , . , ' ■ , ' ■ • ' . % 

never occurs In the primal simplex method, there appears to< be. some 

• ■ . • .» ' • " ... ■ • • 

basis for thinking that the seqlien^e of 2 valuers would tend to zero 

the majority of times. ^ ^ * 

' One could .of eourse only consider constraints generated by 

probability vectors whose coipponents are multiples of l/n. By Imposing 

a lexicographic ordering, one could Insure ait true optimum In a finite • 

number of steps/Ntr I ^ ,° «» • 



/ 



VII. Bounds on Accuracy • * . 

In solving the non-linear program (17-JDB^ In Step A of the 
algorithm to find; the most unsatisfied constraint of |:he lAnear program 
(12-15) , one may wish to terminate the program when 3 V-6 rather than 

for i5>:0 where 6 Is a small positive number> If so, the value of S 

< ■ . . /> 

obtained for (12) will be, less than the true minimum for 2^ since the 
.4 ' . 

program has been optimized for only a subset of the ponstraints. How- 
ever, it is easy to se^ from (12) and (13) that increasing by 6/(1-3) 
'yields a feasible solution and increases 2 by that same amount. Conse- 
quently, this feasible "^t would come to within 6/(1-3) of minimizing Z. 
The question now arises as to how close V('rwr, ttieMaclaurln 




series approximartion to V(7t), is to the true value cf 'V(ir). To ansv^er 

this consider tjre operator Au('rT) defined in equatio 

. max , . ^ 

I |Au - u| I = IT |Au - u^l (19) 

^ ^j^lnce the operator A is a constraction mapping with |Au - Av|<. 
0|u - v|^lt can be shoj^n that | |a"'^''"u -.t^ a"u| |<3"| |Au - u.| | and 
1| a"Ii - u| |<(1-3") | |a^u - u| 1/(1-3) and V(tt) = „^^^a"u. It follows that 

|v(tt) - V,(tt^|<| |Av - v| 1/(1-3) ' ^ (20) 

One could find a local maxtj&um to jAv - v| by an Irjcremental 
procedure similar .to that used |to flnl^^he most unsatisfied corist^alnt 
to introduce Into^ the linear programmiiSig problem. Alternatively^ one 
^could enumerate .(20) ^or all possible probability vectars whose com- 
ponents ^re multiples of l/n.» ' • v 
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